Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jupyter notebook with common neighbours metric on football dataset #342

Merged
merged 3 commits into from
Feb 13, 2019

Conversation

iandioch
Copy link
Member

@iandioch iandioch commented Feb 12, 2019

Connects to #304.

This PR implements the common neighbours metric, and runs it on one of the Pajek datasets used in the paper in #313. Included is loading the dataset, computing the similarity matrix, and computing the AUC (area under receiver operating characteristic curve), which is a measure of true positives vs. false positives. The formula to calculate the AUC (based on n1, n2, n3) comes from the same paper mentioned above.

Added in a new research/ directory, which we can use for a lot of non-prod code in future I guess - experiments, explorations, etc.

What will be required in the actual implementation of this metric in Rabble is a microservice including the computation of a similarity matrix, with some extra bits for eg. updating the matrix regularly (as our follow graph changes), an API for actually getting recommendations from the similarity matrix (ie. return some number of js where S[i][j] is maximal for some given i), and maybe a slightly different accuracy measurement, depending on how easy it is to apply AUC to our own database. Some of the code here might be replicated there later, IDK for sure.

Our results in this notebook:
our results

vs. the paper's results with this metric on the same dataset:
the paper's results

Compare the Average row of our output with the [n1 n2 n3 AUC] in the Average column from the paper.

@iandioch
Copy link
Member Author

FYI, Github will actually render a ipynb file if you click the View file button in the changeset, so you don't necessarily need to run jupyter yourself to review.

Copy link

@devoxel devoxel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty hard to review in it's current form, it isn't production code obviously but it'd be nice if it was easier to read

EDIT: Missed how to view the file properly

research/common_neighbours_football.ipynb Show resolved Hide resolved
@iandioch iandioch merged commit 69f6c88 into master Feb 13, 2019
@iandioch iandioch deleted the n/cn_metric branch February 13, 2019 19:30
@devoxel devoxel mentioned this pull request Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants